Distance-based Outlier Detection in Data Streams

نویسندگان

  • Luan Tran
  • Liyue Fan
  • Cyrus Shahabi
چکیده

Continuous outlier detection in data streams has important applications in fraud detection, network security, and public health. The arrival and departure of data objects in a streaming manner impose new challenges for outlier detection algorithms, especially in time and space efficiency. In the past decade, several studies have been performed to address the problem of distance-based outlier detection in data streams (DODDS), which adopts an unsupervised definition and does not have any distributional assumptions on data values. Our work is motivated by the lack of comparative evaluation among the state-of-the-art algorithms using the same datasets on the same platform. We systematically evaluate the most recent algorithms for DODDS under various stream settings and outlier rates. Our extensive results show that in most settings, the MCOD algorithm offers the superior performance among all the algorithms, including the most recent algorithm Thresh LEAP.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cluster-based Approach for Outlier Detection in Dynamic Data Streams (KORM: k-median OutlieR Miner)

Outlier detection in data streams has gained wide importance presently due to the increasing cases of fraud in various applications of data streams .The techniques for outlier detection have been divided into either statistics based , distance based , density based or deviation based. Till now, most of the work in the field of fraud detection was distance based but it is incompetent from comput...

متن کامل

DBOD-DS: Distance Based Outlier Detection for Data Streams

Data stream is a newly emerging data model for applications like environment monitoring, Web click stream, network traffic monitoring, etc. It consists of an infinite sequence of data points accompanied with timestamp coming from external data source. Typically data sources are located onsite and very vulnerable to external attacks and natural calamities, thus outliers are very common in the da...

متن کامل

A Study on Distance-based Outlier Detection on Uncertain Data

Uncertain data management, querying and mining have become important because the majority of real world data is accompanied with uncertainty these days. Uncertainty in data is often caused by the deficiency in underlying data collecting equipments or sometimes manually introduced to preserve data privacy. The uncertainty information in the data is useful and can be used to improve the quality o...

متن کامل

Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...

متن کامل

A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach

Finding Outlier detection in data streams has gained broad importance presently due to the increasing cases of fraud in various applications of data streams, data cleaning, network monitoring, invasive species monitoring, stock market analysis, detecting outlying cases inmedical data etc. Finding outliers in a collection of patterns is a very well-known problem in the data mining field. An outl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016